Regularization effect on autoencoder

1. Data prep

2. Model architecture

The following model is to minimize loss function, $L_T$, in terms of loss from regular autoencoder, $L$, plus regularization term, $R$.

Encoding layer:
$h = \alpha_e(W_1 \times\ x + b_1)$, where $\alpha_e(.)$ is activation function ReLU, and the number of hidden units in $h$ is 196. Hence $h$ is a 196x1 vector in 196-dim latent space, $W_1$ is 196x748 weight matrix and $b_1$ is the bias term in the form of 196x1 vector.
Drop out layer:
dropout_rate ranged from 0 to .9 to control the percentage of hidden unit to learn.
Decoding layer:
$x' = \alpha_d(W_2 \times\ h + b_2)$, where $\alpha_d(.)$ is activation function sigmoid, $x'$ is the output of the autoencoder which is optimized to reconstruct back to input $x$.
Loss function:
$L_T = L + R = ||x - x'||^2$. No l1 nor l2 regularization is used.

3. Training

4. Loss plot

MSE loss ($L_T$) against hyperparameter, drop_rate, for training set and testing set

5. Sparsity plot

Sparsity of $h$ against hyperparameter, drop_rate, for training set and testing set

6. Weight matrix of the encoder

Weight matrix of the encoder, $W_1$, is shown on a grey-scale heatmap. Each of the subplot showing a row frm $W_1$ reshaped to 28x28

7. Original image vs latent space vs reconstructed image

8. Latent space similarity plot

9. Latent space in T-SNE space

10. K-Mean plot